10 research outputs found
Recommended from our members
Event-based hyperspace analogue to language for query expansion
Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone
Term selection in information retrieval
Systems trained on linguistically annotated data achieve strong performance for many
language processing tasks. This encourages the idea that annotations can improve any
language processing task if applied in the right way. However, despite widespread
acceptance and availability of highly accurate parsing software, it is not clear that ad
hoc information retrieval (IR) techniques using annotated documents and requests consistently
improve search performance compared to techniques that use no linguistic
knowledge. In many cases, retrieval gains made using language processing components,
such as part-of-speech tagging and head-dependent relations, are offset by significant
negative effects. This results in a minimal positive, or even negative, overall
impact for linguistically motivated approaches compared to approaches that do not use
any syntactic or domain knowledge.
In some cases, it may be that syntax does not reveal anything of practical importance
about document relevance. Yet without a convincing explanation for why linguistic
annotations fail in IR, the intuitive appeal of search systems that ‘understand’ text
can result in the repeated application, and mis-application, of language processing to
enhance search performance. This dissertation investigates whether linguistics can improve
the selection of query terms by better modelling the alignment process between
natural language requests and search queries. It is the most comprehensive work on
the utility of linguistic methods in IR to date.
Term selection in this work focuses on identification of informative query terms of
1-3 words that both represent the semantics of a request and discriminate between relevant
and non-relevant documents. Approaches to word association are discussed with
respect to linguistic principles, and evaluated with respect to semantic characterization
and discriminative ability. Analysis is organised around three theories of language that
emphasize different structures for the identification of terms: phrase structure theory,
dependency theory and lexicalism. The structures identified by these theories play
distinctive roles in the organisation of language. Evidence is presented regarding the
value of different methods of word association based on these structures, and the effect
of method and term combinations.
Two highly effective, novel methods for the selection of terms from verbose queries
are also proposed and evaluated. The first method focuses on the semantic phenomenon
of ellipsis with a discriminative filter that leverages diverse text features. The second
method exploits a term ranking algorithm, PhRank, that uses no linguistic information
and relies on a network model of query context. The latter focuses queries so that 1-5
terms in an unweighted model achieve better retrieval effectiveness than weighted IR
models that use up to 30 terms. In addition, unlike models that use a weighted distribution
of terms or subqueries, the concise terms identified by PhRank are interpretable by
users. Evaluation with newswire and web collections demonstrates that PhRank-based
query reformulation significantly improves performance of verbose queries up to 14%
compared to highly competitive IR models, and is at least as good for short, keyword
queries with the same models.
Results illustrate that linguistic processing may help with the selection of word associations
but does not necessarily translate into improved IR performance. Statistical
methods are necessary to overcome the limits of syntactic parsing and word adjacency
measures for ad hoc IR. As a result, probabilistic frameworks that discover, and make
use of, many forms of linguistic evidence may deliver small improvements in IR effectiveness,
but methods that use simple features can be substantially more efficient
and equally, or more, effective. Various explanations for this finding are suggested,
including the probabilistic nature of grammatical categories, a lack of homomorphism
between syntax and semantics, the impact of lexical relations, variability in collection
data, and systemic effects in language systems
Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
Techniques that compare short text segments using dependency paths (or simply, paths) appear in a wide range of automated language processing applications including question answering (QA). However, few models in ad hoc information retrieval (IR) use paths for document ranking due to the prohibitive cost of parsing a retrieval collection. In this paper, we introduce a flexible notion of paths that describe chains of words on a dependency path. These chains, or catenae, are readily applied in standard IR models. Informative catenae are selected using supervised machine learning with linguistically informed features and compared to both non-linguistic terms and catenae selected heuristically with filters derived from work on paths. Automatically selected catenae of 1-2 words deliver significant performance gains on three TREC collections.
Mainstreaming adult ADHD into primary care in the UK: guidance, practice, and best practice recommendations
BACKGROUND: ADHD in adults is a common and debilitating neurodevelopmental mental health condition. Yet, diagnosis, clinical management and monitoring are frequently constrained by scarce resources, low capacity in specialist services and limited awareness or training in both primary and secondary care. As a result, many people with ADHD experience serious barriers in accessing the care they need. METHODS: Professionals across primary, secondary, and tertiary care met to discuss adult ADHD clinical care in the United Kingdom. Discussions identified constraints in service provision, and service delivery models with potential to improve healthcare access and delivery. The group aimed to provide a roadmap for improving access to ADHD treatment, identifying avenues for improving provision under current constraints, and innovating provision in the longer-term. National Institute for Health and Care Excellence (NICE) guidelines were used as a benchmark in discussions. RESULTS: The group identified three interrelated constraints. First, inconsistent interpretation of what constitutes a ‘specialist’ in the context of delivering ADHD care. Second, restriction of service delivery to limited capacity secondary or tertiary care services. Third, financial limitations or conflicts which reduce capacity and render transfer of care between healthcare sectors difficult. The group recommended the development of ADHD specialism within primary care, along with the transfer of routine and straightforward treatment monitoring to primary care services. Longer term, ADHD care pathways should be brought into line with those for other common mental health disorders, including treatment initiation by appropriately qualified clinicians in primary care, and referral to secondary mental health or tertiary services for more complex cases. Long-term plans in the NHS for more joined up and flexible provision, using a primary care network approach, could invest in developing shared ADHD specialist resources. CONCLUSIONS: The relegation of adult ADHD diagnosis, treatment and monitoring to specialist tertiary and secondary services is at odds with its high prevalence and chronic course. To enable the cost-effective and at-scale access to ADHD treatment that is needed, general adult mental health and primary care must be empowered to play a key role in the delivery of quality services for adults with ADHD
Sensitivity of SARS-CoV-2 RNA polymerase chain reaction using a clinical and radiological reference standard: Clinical sensitivity of SARS-CoV-2 PCR.
ObjectivesDiagnostic tests for SARS-CoV-2 are important for epidemiology, clinical management, and infection control. Limitations of oro-nasopharyngeal real-time PCR sensitivity have been described based on comparisons of single tests with repeated sampling. We assessed SARS-CoV-2 PCR clinical sensitivity using a clinical and radiological reference standard.MethodsBetween March-May 2020, 2060 patients underwent thoracic imaging and SARS-CoV-2 PCR testing. Imaging was independently double- or triple-reported (if discordance) by blinded radiologists according to radiological criteria for COVID-19. We excluded asymptomatic patients and those with alternative diagnoses that could explain imaging findings. Associations with PCR-positivity were assessed with binomial logistic regression.Results901 patients had possible/probable imaging features and clinical symptoms of COVID-19 and 429 patients met the clinical and radiological reference case definition. SARS-CoV-2 PCR sensitivity was 68% (95% confidence interval 64-73), was highest 7-8 days after symptom onset (78% (68-88)) and was lower among current smokers (adjusted odds ratio 0.23 (0.12-0.42) pConclusionsIn patients with clinical and imaging features of COVID-19, PCR test sensitivity was 68%, and was lower among smokers; a finding that could explain observations of lower disease incidence and that warrants further validation. PCR tests should be interpreted considering imaging, symptom duration and smoking status
How data science can advance mental health research
Accessibility of powerful computers and availability of so-called big data from a variety of sources means that data science approaches are becoming pervasive. However, their application in mental health research is often considered to be at an earlier stage than in other areas despite the complexity of mental health and illness making such a sophisticated approach particularly suitable. In this Perspective, we discuss current and potential applications of data science in mental health research using the UK Clinical Research Collaboration classification: underpinning research; aetiology; detection and diagnosis; treatment development; treatment evaluation; disease management; and health services research. We demonstrate that data science is already being widely applied in mental health research, but there is much more to be done now and in the future. The possibilities for data science in mental health research are substantia